ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / tsql / doc / tsql.mail / 000104_kline _Mon May 3 00:35:43 1993.msg < prev next >

Wrap

Internet Message Format | 1996-01-31 | 10KB

Received: from cheltenham.CS.Arizona.EDU by optima.CS.Arizona.EDU (5.65c/15) via SMTP id AA24317; Mon, 3 May 1993 00:35:45 MST Date: Mon, 3 May 1993 00:35:43 MST From: "Nick Kline" <kline> Message-Id: <199305030735.AA16352@cheltenham.cs.arizona.edu> Received: by cheltenham.cs.arizona.edu; Mon, 3 May 1993 00:35:43 MST To: tsql Subject: updated valid-time aggregate defs I've updated the definitions concerning valid-time partitioning in response to several comments. I eliminated the definitions for partitioning attribute and value partioning since these are not temporal database aspects. They were include for completeness before. I have tried to more clearly explain the motivations behind having an *associated interval* with each valid-time temporal element (TE). I actually partition the time-line into TE's and associate with each TE an interval. The reason for this association is two-fold: 1) it's useful for the subdivision of the valid time-line to be a partitioning (in the mathematical sense) 2) it's very useful to allow overlapping intervals and this is excluded by pt. 1 above Defining the terms this way is very general, yet it allows succinct definitions (excluding the long discussions!). The revised definitions follow. Please contact me with any correspondence. Thanks, Nick Kline kline@cs.arizona.edu % Document Type: LaTeX \documentstyle[11pt]{article} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % VARIOUS MACROS %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \long\def\comment#1{} \newcommand{\entry}[1]{\subsubsection*{#1}} \addtolength{\textwidth}{1.485in}%{1.2in} \setlength{\oddsidemargin}{.1in}%{.3in} \setlength{\evensidemargin}{.1in}%{.3in} \addtolength{\topmargin}{-.85in} %{-1.35in} \addtolength{\textheight}{1.8in} %{2.8in} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % PAPER START %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{document} \subsection{Valid-time Partitioning} {\em Valid-time partitioning} is the partitioning (in the mathematical sense) of the valid time-line into {\em valid-time elements}. For each valid-time element, we associate an interval of the valid time-line on which a cumulative aggregate may then be applied. \entry{Alternative Names} Valid-time grouping. \entry{Discussion} To compute the aggregate, first partition the time-line into valid-time elements, then associate an interval with each valid-time element, assemble the tuples valid over each interval, and finally compute the aggregate over each of these sets. The value at any event is the value computed over the partitioning element that contains that event. The reason for the {\em associated} interval with each temporal element is that we wish to perform a {\em partition} of the valid time-line, and not exclude certain queries. If we exclude computing the aggregate on overlapping intervals, we exclude queries such as ``Find the average salary paid for one year before each hire.'' Such queries would be excluded because the one-year intervals before each hire might overlap. Partitioning the time-line is a useful capability for aggregates in temporal databases (+R1,+R3). Grouping is inappropriate because the valid-time elements form a true partition; they do not overlap and must cover the time line. However the associated intervals may be defined in any way. One example of valid-time partitioning is to divide the time-line into years, based on the Gregorian calendar. Then for each year, compute the count of the tuples which overlap that year. There is no existing term for this concept. There is no partitioning attribute in valid-time partitioning, since the partitioning does not depend on attribute values, but instead on valid-times. Valid-time partitioning may occur before or after value partitioning. \subsection{Dynamic Valid-time Partitioning} In {\em dynamic valid-time partitioning} the valid-time elements used in the partitioning are determined solely from the timestamps of the relation. \entry{Alternative Names} Moving window. \entry{Discussion} The term dynamic is appropriate (as opposed to static) because if the information in the database changes, the partitioning intervals may change. The intervals are determined from intrinsic information. One example of dynamic valid-time partitioning would be to compute the average value of an attribute in a relation (say the salary attribute), for the previous year before the stop-time of each tuple. A technique which could be used to compute this query would be for each tuple, find all tuples valid in the previous year before the stop-time of the tuple in question, and combine these tuples into a set. Finally, compute the average of the salary attribute values in each set. It may seem inappropriate to use valid-time elements instead of intervals, however there is no reason to exclude valid-time elements. Perhaps the elements are the intervals during which the relation is constant. The existing term for this concept does not have an opposing term suitable to refer to static valid-time partitioning, and can not distinguish between the two types of valid-time partitioning (-E3, +E9). Various temporal query languages have used both dynamic and static valid-time partitioning, but have not always been clear about which type of partitioning they support (+E1). Utilization of these terms will remove this ambiguity from future discussions. \subsection{Static Valid-time Partitioning} \entry{Definition} In {\em static valid-time partitioning} the valid-time elements used are determined solely from fixed points on a calendar, such as the start of each year. \entry{Alternative Names} Moving window. \entry{Discussion} This term further distinguishes existing terms (-E3, +E9). It is an obvious parallel to dynamic valid-time partitioning (+E1). Static is an appropriate term because the valid-time elements are determined from extrinsic information. The partitioning element would not change if the information in the database changed. Computing the maximum salary of employees during each month is an example which requires using static valid-time partitioning. To compute this information, first divide the time-line into valid-time elements where each element represents a separate month on, say, the Gregorian calendar. Then, find the tuples valid over each valid-time element, and compute the maximum aggregate over the members of each set. \subsection{Valid-time Cumulative Aggregation} \entry{Definition} In {\em cumulative aggregation}, for each valid-time element of the valid-time partitioning (produced by either dynamic or static valid-time partitioning), the aggregate is applied to all tuples associated with that valid-time element. The value of the aggregate at any event is the value computed over the partitioning element that contains that event. \entry{Alternative Names} Moving window. \entry{Discussion} {\em Cumulative} is used because the interesting values are defined over a cumulative range of time (+E8). This term is more precise than the existing term (-E3, +E9). Instantaneous aggregation may be considered to be a degenerate case of cumulative aggregation where the partition is per chronon and the associated interval is that chronon. One example of cumulative aggregation would be find the total number of employees who had worked at some point for a company. To compute this value at the end of each calendar year, then, for each year, define a valid-time element which is valid from the beginning of time up to the end of that year. For each valid-time element, find all tuples which overlap that element, and finally, count the number of tuples in each set. \subsection{Instantaneous Aggregation} \entry{Definition} In {\em instantaneous aggregation}, for each chronon on the valid time-line, the aggregate is applied to all tuples valid at that event. \entry{Alternative Names} None. \entry{Discussion} The term {\em instantaneous} is appropriate because the aggregate is applied over every chronon, every event. It suggests an interest in the aggregate value over a very small time interval, an instant, much as acceleration is defined in physics over an infinitesimally small time (+R3). Many temporal query languages perform instantaneous aggregation, others use cumulative aggregation, while still others use a combination of the two. This term will be useful to distinguish between the various alternatives, and is already used by some researchers (+R4,+E3). \subsection{Gregorian Calendar} \entry{Definition} The {\em Gregorian calendar} is composed of 12 months, named in order, January, February, March, April, May, June, July, August, September, October, November, and December. The 12 months form a year. A year is either 365 or 366 days in length, where the extra day is used on ``leap years.'' Leap years are defined as years evenly divisible by 4, with centesimal years being excluded, unless that year is divisible by 400. Each month has a fixed number of days, except for February, the length of which varies by a day depending on whether or not the particular year is a leap year. \entry{Alternative Names} None. \entry{Discussion} The Gregorian calendar is widely used and accepted (+E3,+E7). This term is defined and used elsewhere (-R1), but is in such common use in temporal databases that it should be defined. \end{document}